This dataset includes keystrokes and other data from a CS1 (CS 1400) course at
Utah State University taught in spring and fall, 2019. Files include:

* keystrokes.csv - Keystrokes in mostly compliant ProgSnap2 format. See
  https://cssplice.github.io/progsnap2/ProgSnap2-v8-18Dec2020.pdf.
  Differences in columns from ProgSnap2 are:
  ** SourceLocation - One-dimensional index of the linearized program. The
     standard requires a line and column. We use a single index for efficiency
     purposes. For space purposes it is not prefixed by "text:" as given in the
     standard.
  ** Code state is not maintained, so CodeStateID is always nan.
  ** ClientTimestamp - see note on timestamps at the end of this file.
  ** AssignmentID - Assignments are p4-p8. A letter 's' (spring) or 'f' (fall)
     is appended to the AssignmentID to indicate the term the assignment was
     worked on. This distinction needs to be made because some students took the
     course twice, once in each term.
  ** X-RawAssignmentID - Since the AssignmentID has 's' or 'f' appended, this
     column is included for ease of getting the actual assignment ID.
     Assignments between the two terms are identical.
  ** X-Compilable - 1 if the current state has no syntax errors, 0 otherwise.

  In this dataset keystrokes and file edit events are not decoupled. All
  keystrokes result in an edit to the file while not all edits are initiated by
  keystrokes. Events of the latter type have NaN for the X-Keystroke field.

  If you sort on timestamps be sure to sort also on event ID, since some pairs
  of events have the same timestamp, though their relative ordering is
  important to preserve.

* keystrokes-survey.csv - This file is identical to keystrokes.csv in structure.
  It has a log of keystrokes from students while completing a free-response
  survey assignment that asked questions about Phanon exercises, the tutoring
  lab, etc. This survey was given only at the end of fall term -- spring
  students will not have an entry.

* students.csv - An irreconcilable error was found in students.csv and it has
  been removed from the dataset.

* due.csv - Due dates for the assignments. "Timestamp UTC" is given as a
  convenience: it corresponds with the timestamps in keystrokes.csv for ease
  of calculation of remaining time before the due date. "Timestamp UTC" is
  Unix Epoch time in UTC.

* Assignment Descriptions - Descriptions for each of the assignments. Some of
  the assignments come from the book "Introduction to Programming Using Python"
  by Y. Daniel Liang, ISBN-13: 978-0-13-440024-2.

* keystrokes-raw.csv - The data logger was imperfect and some logs became
  corrupted. Using this original data is problematic when reconstructing code.
  However, if only looking at digraphs it is okay across virtually all events.
  For this type of analysis we make the raw keystroke data available. It has
  close to double the number of events as keystrokes.csv. However, be sure to
  NOT use the raw data if you are doing anything more than just looking at
  single keystrokes or pairs of keystrokes.

All timestamps in keystrokes.csv are from Java's
System.CurrentTimeMillis(), which gives Unix Epoch time in UTC. due.csv has
MDT/MST, UTC, and ms since the Unix epoch versions of the due dates.

A small number of fall students (S117, S193, S243) accessed assignment p8
months after completing it. We assume this is because they were retaking the
class in a later term.
